With the recently increasing capabilities of modern vehicles, novel approaches for interaction emerged that go beyond traditional touch-based and voice command approaches. Therefore, hand gestures, head pose, eye gaze, and speech have been extensively investigated in automotive applications for object selection and referencing. Despite these significant advances, existing approaches mostly employ a one-model-fits-all approach unsuitable for varying user behavior and individual differences. Moreover, current referencing approaches either consider these modalities separately or focus on a stationary situation, whereas the situation in a moving vehicle is highly dynamic and subject to safety-critical constraints. In this paper, I propose a research plan for a user-centered adaptive multimodal fusion approach for referencing external objects from a moving vehicle. The proposed plan aims to provide an open-source framework for user-centered adaptation and personalization using user observations and heuristics, multimodal fusion, clustering, transfer-of-learning for model adaptation, and continuous learning, moving towards trusted human-centered artificial intelligence.
translated by 谷歌翻译
一些研究人员专注于研究驾驶时驾驶员的认知行为和精神负荷。随着心理和感知负荷水平而变化的自适应界面可能有助于减少事故并增强驾驶员体验。在本文中,我们分析了心理工作量和感知负荷对心理生理维度的影响,并在双车间互动的双重任务方案中为精神和感知负荷估算提供了基于机器学习的框架(https://github.com/ Amrgomaaelhady/mwl-pl-估计器)。我们使用现成的非侵入传感器,可以轻松地集成到车辆系统中。我们的统计分析表明,尽管心理工作负载影响了一些心理生理方面,但感知负荷几乎没有影响。此外,我们通过融合这些测量值对心理和感知负载水平进行了分类,朝着实时自适应的车载界面迈进,该界面是个性化的,该界面是个性化的用户行为和驾驶条件。我们报告多达89%的心理工作负载分类准确性,并提供实时最低侵入的解决方案。
translated by 谷歌翻译
在过去的几十年中,除了上百个传感器现代汽车已经导致其能力的指数级增长。这使得新的方式相互作用超越车辆传统的基于触摸和语音命令的方法,如情感识别,头部转动,眼睛凝视和指点手势。虽然目光和指点手势之前已经被用于参考对象内,外的车辆,这些手势的多通道交互与融合迄今尚未得到广泛研究。我们提出了参考外的车辆对象,同时保持在模拟环境中长时间行驶路线一种新型的基于学习的多模态融合的方法。所提出的多方法优于在多个方面和条件单模态的方法。此外,我们还表明可能的方式完成任务,参考实现对每个驾驶员适应性强的个性化系统时,利用用户之间的行为差​​异。我们提出了基于转移的学习概念的个性化技术对于非常小的数据大小,以提高预测和适应个人主义的引用行为。我们的代码是公开的,在https://github.com/amr-gomaa/ML-PersRef。
translated by 谷歌翻译
Convolutional neural networks (CNNs) are one of the most successful computer vision systems to solve object recognition. Furthermore, CNNs have major applications in understanding the nature of visual representations in the human brain. Yet it remains poorly understood how CNNs actually make their decisions, what the nature of their internal representations is, and how their recognition strategies differ from humans. Specifically, there is a major debate about the question of whether CNNs primarily rely on surface regularities of objects, or whether they are capable of exploiting the spatial arrangement of features, similar to humans. Here, we develop a novel feature-scrambling approach to explicitly test whether CNNs use the spatial arrangement of features (i.e. object parts) to classify objects. We combine this approach with a systematic manipulation of effective receptive field sizes of CNNs as well as minimal recognizable configurations (MIRCs) analysis. In contrast to much previous literature, we provide evidence that CNNs are in fact capable of using relatively long-range spatial relationships for object classification. Moreover, the extent to which CNNs use spatial relationships depends heavily on the dataset, e.g. texture vs. sketch. In fact, CNNs even use different strategies for different classes within heterogeneous datasets (ImageNet), suggesting CNNs have a continuous spectrum of classification strategies. Finally, we show that CNNs learn the spatial arrangement of features only up to an intermediate level of granularity, which suggests that intermediate rather than global shape features provide the optimal trade-off between sensitivity and specificity in object classification. These results provide novel insights into the nature of CNN representations and the extent to which they rely on the spatial arrangement of features for object classification.
translated by 谷歌翻译
Unavailability of parallel corpora for training text style transfer (TST) models is a very challenging yet common scenario. Also, TST models implicitly need to preserve the content while transforming a source sentence into the target style. To tackle these problems, an intermediate representation is often constructed that is devoid of style while still preserving the meaning of the source sentence. In this work, we study the usefulness of Abstract Meaning Representation (AMR) graph as the intermediate style agnostic representation. We posit that semantic notations like AMR are a natural choice for an intermediate representation. Hence, we propose T-STAR: a model comprising of two components, text-to-AMR encoder and a AMR-to-text decoder. We propose several modeling improvements to enhance the style agnosticity of the generated AMR. To the best of our knowledge, T-STAR is the first work that uses AMR as an intermediate representation for TST. With thorough experimental evaluation we show T-STAR significantly outperforms state of the art techniques by achieving on an average 15.2% higher content preservation with negligible loss (3% approx.) in style accuracy. Through detailed human evaluation with 90,000 ratings, we also show that T-STAR has up to 50% lesser hallucinations compared to state of the art TST models.
translated by 谷歌翻译
AMR parsing is the task that maps a sentence to an AMR semantic graph automatically. We focus on the breadth-first strategy of this task, which was proposed recently and achieved better performance than other strategies. However, current models under this strategy only \emph{encourage} the model to produce the AMR graph in breadth-first order, but \emph{cannot guarantee} this. To solve this problem, we propose a new architecture that \emph{guarantees} that the parsing will strictly follow the breadth-first order. In each parsing step, we introduce a \textbf{focused parent} vertex and use this vertex to guide the generation. With the help of this new architecture and some other improvements in the sentence and graph encoder, our model obtains better performance on both the AMR 1.0 and 2.0 dataset.
translated by 谷歌翻译
在本文中,我们在CAMRP-2022评估中提供了对系统的详细说明。我们首先提出了一种两阶段的方法来进行中文AMR解析,并产生对齐,其中包括概念预测和关系预测阶段。我们的型号在CAMR 2.0测试集上获得了0.7756和0.7074对齐的F1分数,并单独使用CAMRP-2022的盲目测试集。我们还分析了结果和限制,例如误差传播和阶级失衡问题,我们在当前方法中得出结论。代码和训练有素的模型将在https://github.com/pkunlp-icler/two-stage-camrp上发布,用于复制。
translated by 谷歌翻译
意见摘要是创建摘要的任务,以获取用户评论中的流行意见。在本文中,我们介绍了Geodesic Summarizer(GeoSumm),这是一种新型系统,可执行无监督的提取意见摘要。 GeoSumm涉及基于编码器的表示模型,该模型将文本表示为潜在语义单元的分布。 GeoSumm通过在多个解码器层上对预训练的文本表示进行字典学习来生成这些表示。然后,我们使用这些表示形式使用新型的基于测量距离的评分机制来量化审查句子的相关性。我们使用相关得分来确定流行意见,以构成一般和特定方面的摘要。我们提出的模型GeoSumm在三个意见摘要数据集上实现了最先进的性能。我们执行其他实验来分析模型的功能,并展示跨不同域{\ x}的概括能力。
translated by 谷歌翻译
尽管深度神经网络(DNN)已成为多个无处不在的应用程序的骨干技术,但它们在资源受限的机器中的部署,例如物联网(IoT)设备,仍然具有挑战性。为了满足这种范式的资源要求,引入了与IoT协同作用的深入推断。但是,DNN网络的分布遭受严重的数据泄漏。已经提出了各种威胁,包括黑盒攻击,恶意参与者可以恢复送入其设备的任意输入。尽管许多对策旨在实现隐私的DNN,但其中大多数会导致额外的计算和较低的准确性。在本文中,我们提出了一种方法,该方法通过重新考虑分配策略而无需牺牲模型性能来针对协作深度推断的安全性。特别是,我们检查了使该模型容易受到黑盒威胁的不同DNN分区,并得出了应分配每个设备的数据量以隐藏原始输入的所有权。我们将这种方法制定为一种优化,在该方法中,我们在共同推导的延迟与数据级别的数据级别之间建立了权衡。接下来,为了放大最佳解决方案,我们将方法塑造为支持异质设备以及多个DNN/数据集的增强学习(RL)设计。
translated by 谷歌翻译
以安全为导向的研究思想和应用的开发需要精细的车辆轨迹数据,这些数据不仅具有很高的精度,而且还捕获了大量关键安全事件。本文介绍了Citysim数据集,该数据集的设计核心目的是促进基于安全的研究和应用。 Citysim的车辆轨迹从在12个不同位置录制的1140分钟的无人机视频中提取。它涵盖了各种道路几何形状,包括高速公路基本段,编织段,高速公路合并/偏离段,信号交叉点,停止对照的交叉点以及没有符号/信号控制的交叉点。通过五步操作生成CitySim轨迹,以确保轨迹精度。此外,数据集提供了车辆旋转的边界框信息,该信息被证明可以改善安全评估。与其他基于视频的轨迹数据集相比,CitySim数据集的严重性更高,包括切入,合并和分歧事件,其严重性更高。此外,CitySim通过提供相关资产(如记录位置的3D基本地图和信号时间)来促进对数字双胞胎应用的研究。这些功能为安全研究和应用程序提供了更全面的条件,例如自动驾驶汽车安全和基于位置的安全分析。该数据集可在https://github.com/ozheng1993/ucf-sst-citysim-dataset上在线获得。
translated by 谷歌翻译